智能论文笔记

Forecasting through deep learning and modal decomposition in multi-phase concentric jets

León Mata , Rodrigo Abadía-Heredia , Manuel Lopez-Martin , José M. Pérez , Soledad Le Clainche

分类：机器学习

2022-12-24

This work presents a set of neural network (NN) models specifically designed for accurate and efficient fluid dynamics forecasting. In this work, we show how neural networks training can be improved by reducing data complexity through a modal decomposition technique called higher order dynamic mode decomposition (HODMD), which identifies the main structures inside flow dynamics and reconstructs the original flow using only these main structures. This reconstruction has the same number of samples and spatial dimension as the original flow, but with a less complex dynamics and preserving its main features. We also show the low computational cost required by the proposed NN models, both in their training and inference phases. The core idea of this work is to test the limits of applicability of deep learning models to data forecasting in complex fluid dynamics problems. Generalization capabilities of the models are demonstrated by using the same neural network architectures to forecast the future dynamics of four different multi-phase flows. Data sets used to train and test these deep learning models come from Direct Numerical Simulations (DNS) of these flows.

translated by 谷歌翻译

Sketch-and-solve approaches to k-means clustering by semidefinite programming

Charles Clum , Dustin G. Mixon , Soledad Villar , Kaiying Xie

分类：机器学习 | (统计)机器学习

2022-11-28

We introduce a sketch-and-solve approach to speed up the Peng-Wei semidefinite relaxation of k-means clustering. When the data is appropriately separated we identify the k-means optimal clustering. Otherwise, our approach provides a high-confidence lower bound on the optimal k-means value. This lower bound is data-driven; it does not make any assumption on the data nor how it is generated. We provide code and an extensive set of numerical experiments where we use this approach to certify approximate optimality of clustering solutions obtained by k-means++.

translated by 谷歌翻译

Graph Neural Networks for Community Detection on Sparse Graphs

Luana Ruiz , Ningyuan , Huang , Soledad Villar

分类：机器学习

2022-11-06

Spectral methods provide consistent estimators for community detection in dense graphs. However, their performance deteriorates as the graphs become sparser. In this work we consider a random graph model that can produce graphs at different levels of sparsity, and we show that graph neural networks can outperform spectral methods on sparse graphs. We illustrate the results with numerical examples in both synthetic and real graphs.

translated by 谷歌翻译

Equivariant maps from invariant functions

Ben Blum-Smith , Soledad Villar

分类： (统计)机器学习 | 机器学习

2022-09-29

在模棱两可的机器中，学习的想法是将学习限制为假设类别，在某些群体行动方面，所有功能都是均等的。通常使用不可约说的表示或不变理论来参数化此类函数的空间。在本说明中，我们解释了归因于Malgrange的一般过程，以表达线性空间之间的所有多项式图，这些线性空间相对于组$ G $的作用，鉴于对较大空间的不变多项式的表征。该方法还可以在$ g $是一个紧凑的谎言组的情况下参数光滑的模糊图。

translated by 谷歌翻译

From Local to Global: Spectral-Inspired Graph Neural Networks

Ningyuan Huang , Soledad Villar , Carey E. Priebe , Da Zheng , Chengyue Huang , Lin Yang , Vladimir Braverman

分类： (统计)机器学习 | 机器学习

2022-09-24

图神经网络（GNN）是非欧盟数据的强大深度学习方法。流行的GNN是通信算法（MPNNS），它们在本地图中汇总并结合了信号。但是，浅的mpnns倾向于错过远程信号，并且在某些异质图上表现不佳，而深度mpnns可能会遇到过度平滑或过度阵型等问题。为了减轻此类问题，现有的工作通常会从欧几里得数据上训练神经网络或修改图形结构中借用归一化技术。然而，这些方法在理论上并不是很好地理解，并且可能会提高整体计算复杂性。在这项工作中，我们从光谱图嵌入中汲取灵感，并提出$ \ texttt {powerembed} $ - 一种简单的层归一化技术来增强mpnns。我们显示$ \ texttt {powerembed} $可以证明图形运算符的顶部 - $ k $引导特征向量，该算子可以防止过度光滑，并且对图形拓扑是不可知的；同时，它产生了从本地功能到全球信号的表示列表，避免了过度阵列。我们将$ \ texttt {powerembed} $应用于广泛的模拟和真实图表，并展示其竞争性能，尤其是对于异性图。

translated by 谷歌翻译

MarkerMap: nonlinear marker selection for single-cell studies

Nabeel Sarwar , Wilson Gregory , George A Kevrekidis , Soledad Villar , Bianca Dumitrascu

分类： (统计)机器学习 | 机器学习

2022-07-28

单细胞RNA-seq数据允许在不断增长的一组生物环境中定量细胞类型差异。但是，确定了一小部分基因组特征来解释这种变异性可能是错误的，并且在计算上很棘手。在这里，我们介绍了MarkerMap，这是一种用于选择最小基因集的生成模型，这些基因集对细胞类型的起源提供最大信息，并启用整个转录组重建。MarkerMap为旨在识别特定细胞类型种群的监督标记选择提供了可扩展的框架，以及针对基因表达插补和重建的无监督标记选择。我们基于Markermap的竞争性能，以实现对真实单细胞基因表达数据集的先前发表的方法。MarkerMap可作为可安装的PIP软件包获得，可作为旨在开发可解释的机器学习技术的社区资源，以增强单细胞研究中的可解释性。

translated by 谷歌翻译

Dimensionless machine learning: Imposing exact units equivariance

Soledad Villar , Weichi Yao , David W. Hogg , Ben Blum-Smith , Bianca Dumitrascu

分类： (统计)机器学习 | 机器学习

2022-04-02

Units equivariance (or units covariance) is the exact symmetry that follows from the requirement that relationships among measured quantities of physics relevance must obey self-consistent dimensional scalings. Here, we express this symmetry in terms of a (non-compact) group action, and we employ dimensional analysis and ideas from equivariant machine learning to provide a methodology for exactly units-equivariant machine learning: For any given learning task, we first construct a dimensionless version of its inputs using classic results from dimensional analysis, and then perform inference in the dimensionless space. Our approach can be used to impose units equivariance across a broad range of machine learning methods which are equivariant to rotations and other groups. We discuss the in-sample and out-of-sample prediction accuracy gains one can obtain in contexts like symbolic regression and emulation, where symmetry is important. We illustrate our approach with simple numerical examples involving dynamical systems in physics and ecology.

translated by 谷歌翻译

Invalidator: Automated Patch Correctness Assessment via Semantic and Syntactic Reasoning

Thanh Le-Cong , Duc-Minh Luong , Xuan Bach D. Le , David Lo , Nhat-Hoa Tran , Bui Quang-Huy , Quyet-Thang Huynh

分类：机器学习

2023-01-03

In this paper, we propose a novel technique, namely INVALIDATOR, to automatically assess the correctness of APR-generated patches via semantic and syntactic reasoning. INVALIDATOR reasons about program semantic via program invariants while it also captures program syntax via language semantic learned from large code corpus using the pre-trained language model. Given a buggy program and the developer-patched program, INVALIDATOR infers likely invariants on both programs. Then, INVALIDATOR determines that a APR-generated patch overfits if: (1) it violates correct specifications or (2) maintains errors behaviors of the original buggy program. In case our approach fails to determine an overfitting patch based on invariants, INVALIDATOR utilizes a trained model from labeled patches to assess patch correctness based on program syntax. The benefit of INVALIDATOR is three-fold. First, INVALIDATOR is able to leverage both semantic and syntactic reasoning to enhance its discriminant capability. Second, INVALIDATOR does not require new test cases to be generated but instead only relies on the current test suite and uses invariant inference to generalize the behaviors of a program. Third, INVALIDATOR is fully automated. We have conducted our experiments on a dataset of 885 patches generated on real-world programs in Defects4J. Experiment results show that INVALIDATOR correctly classified 79% overfitting patches, accounting for 23% more overfitting patches being detected by the best baseline. INVALIDATOR also substantially outperforms the best baselines by 14% and 19% in terms of Accuracy and F-Measure, respectively.

translated by 谷歌翻译

State and parameter learning with PaRIS particle Gibbs

Gabriel Cardoso , Yazid Janati El Idrissi , Sylvain Le Corff , Eric Moulines , Jimmy Olsson

分类： (统计)机器学习

2023-01-02

Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.

translated by 谷歌翻译

Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension

Hang Thi-Thu Le , Viet-Duc Ho , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.

translated by 谷歌翻译